The NIFTY 50 index is National Stock Exchange of India's benchmark broad based stock market index for the Indian equity market. NIFTY 50 stands for National Index Fifty, and represents the weighted average of 50 Indian company stocks in 17 sectors. It is one of the two main stock indices used in India, the other being the BSE Sensex. Source:Wikipedia
The NIFTY 50 is a diversified 50 stock index accounting for 13 sectors (as on 30 April 2021) of the economy. It is used for a variety of purposes such as benchmarking fund portfolios, index based derivatives and index funds.

source:https://stableinvestor.com/2019/09/explained-nifty-indices-nifty50.html
From above Nifty 500 classification, we will concern only about Nifty 50.
#https://www.google.com/maps/search/nse+india+location+mumbai/@19.0601528,72.8597672,20.11z is from Google Map
nse_map = folium.Map(location=[19.060288332910762, 72.85984823112783], zoom_start=12, max_zoom=13)
tooltip = "National Stock Exchange of India"
folium.Marker(
[19.06028008587612, 72.85987316084454], popup="<i>National Stock Exchange</i>", tooltip=tooltip
).add_to(nse_map)
nse_map
This Project is to perform the Exploratory Data analysis on the Nifty 50 Stock data. In my analysis I will try to extract some useful standard results for Financial Investors or Companies. Here we use various libraries of Python for visualization of Data.
The Libraries which are used in Project are:
To install all required libraries, run the following Command:
pip install numpy pandas matplotlib seaborn plotly --upgrade
The Data visualization is the graphic representation of data.
It involves producing images that communicate relationships among the represented data to viewers.
Visualizing data is an essential part of data analysis and machine learning.
In this projedt, we'll use Python libraries like Matplotlib, Seaborn and Plotly to learn and apply some popular data visualization techniques.
Following Tasks are Implemented in the Project:
The following steps are made to accomplish the Project:
This Project is on financial market domain and we have to download data appropriately. The Dataset is taken from www.kaggle.com (👈 Click to Download) and contains all price series for all 50 stock of Nifty index (India).
The data is the price history and trading volumes of the fifty stocks in the index NIFTY 50 from NSE (National Stock Exchange) India. All datasets are at a day-level with pricing and trading values split across .cvs files for each stock along with a metadata file with some macro-information about the stocks itself. The data spans from 27th November, 2007 to 30th October, 2020.
Acknowledgements¶
NSE India: https://www.nseindia.com/
Thanks to NSE for providing all the data publicly.
Let's begin by downloading the data, and listing the files within the dataset.
dataset_url = 'https://www.kaggle.com/rohanrao/nifty50-stock-market-data'
The below statements are used to download the dataset if we are in Kaggle/Colab Notebook
import opendatasets as od
dataset_url= 'https://www.kaggle.com/rohanrao/nifty50-stock-market-data'
od.download(dataset_url)
But we require the data for our machine where we will be implementing in our localhost, therefore we prefer downloading the data externally, then extracting the data and placing in the root project directory.
Now Let's look our dataset
data_dir = './Stock_Market_Nifty_CSV/' #we will be storing all the dataset files in a data_dir
import os
os.listdir(data_dir)
# We will be including the libraries in this cell
import random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas_datareader.data as web
import plotly.graph_objs as go
import plotly.express as px
import folium
from plotly.offline import init_notebook_mode,iplot
import cufflinks as cf
cf.go_offline()
cf.set_config_file(offline=False, world_readable=True)
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
Now we have successfully imported all the libraries which we will be using in our Project.
Data Preprocessing
Here we perform various modes of displaying dataset which is in CSV format. First step is to load the data using pandas read_csv function. The data is stored in mutidimensional table called as dataframe.
Now we will Create a new dataframe that merge single price history for all 50 stocks with stock-metadata extra informations (like sectors, ISIN, etc) useful for analysis. Adding some extra columns with financial formula.
Stock_metadata.csv : This data file will tell us about the Nifty50 Datasets. This will explain about te data present in Nifty50 stock csv
Steps to follow :
- Load the File using Pandas
- Look at some information about data and the columns
- Fix the missing and incorrect values it they are present
# loading the stock_metedata.csv in a variable and checking its type
stock_metadata_df = pd.read_csv('./Stock_Market_Nifty_CSV/stock_metadata.csv')
type(stock_metadata_df)
stock_metadata_df
Now we got the dataframe for the stock_metadata.csv. Here we have five columns, they are:
Note : EQ - It stands for Equity.
stock_metadata_df.info()
# loading the nifty50_all.csv in a variable and checking its type
NIFTY50_all_df = pd.read_csv('./Stock_Market_Nifty_CSV/NIFTY50_all.csv')
NIFTY50_all_df
Now the description of column names of above Dataframe
Source: Investopedia.com
nse_india_df = NIFTY50_all_df.merge(stock_metadata_df, on="Symbol")
nse_india_df
# nse_india_df = NIFTY50_all_df + stock_metadata_df (combined for the sake of new column names)
# print all the column names of the nse_india_df
for col in nse_india_df.columns:
print(col)
#Count the total number of columns in the merged dataframe
len(nse_india_df.columns)
nse_india_df.shape
nse_india_df.isnull()
nse_india_df.isnull().sum()
nse_india_df.fillna(nse_india_df.mean(), inplace=True)
nse_india_df.isnull().sum()
# This operation will take approx 25 minutes to fill all the empty cells
nse_india_df.Close.max()
nse_india_df.Close.min()
# here we try to reset the index based on 'Close' column accordingly in the nse_india_df dataframe
# Then store in nse_india_viz variable and visualize with plot.
nse_india_viz = nse_india_df.reset_index()['Close']
plt.figure(figsize=(15,5))
plt.plot(nse_india_viz)
plt.title('Whole Dataset plot view of Month end closure value')
plt.xlabel('Maximum EOM value')
plt.ylabel('Total number of Rows')
plt.show()
We have successfully Plotted the First visualization using Matplotlib.
Using Entire companies dataset of 207850 rows × 19 columns ......
⚠️ ⚠️ ⚠️ But Here their are few Disadvantages that causes if we move further ⚠️ ⚠️ ⚠️
- This will lead to the Dense Plots
- Incomplete fitting of rows
- Irregular patterns or insights of data
- Key assumptions will be lost
- Causing False impression.
To overcome these, we will use, One single Company insteard of Entire companies, To dive further into the Analysis
| A | B | C | D | E | |
|---|---|---|---|---|---|
| ADANIPORTS | ASIANPAINT | AXISBANK | BAJAJ-AUTO | BAJAJFINSV | |
| BAJFINANCE | BHARTIARTL | BPCL | BRITANNIA | CIPLA | |
| COALINDIA | DRREDDY | EICHERMOT | GAIL | GRASIM | |
| HCLTECH | HDFC | HDFCBANK | HEROMOTOCO | HINDALCO | |
| HINDUNILVR | ICICIBANK | INDUSINDBK | INFRATEL | INFY | |
| IOC | ITC | JSWSTEEL | KOTAKBANK | LT | |
| M&M | MARUTI | NESTLEIND | NTPC | ONGC | |
| POWERGRID | RELIANCE | SBIN | SHREECEM | SUNPHARMA | |
| TATAMOTORS | TATASTEEL | TCS | TECHM | TITAN | |
| ULTRACEMCO | UPL | VEDL | WIPRO | ZEEL |
Here their are Symbols of all Companies which are in Nifty50. We will be Getting the detailed anslysis of each company individually.
c_list = ['ADANIPORTS','ASIANPAINT','AXISBANK','BAJAJ-AUTO','BAJAJFINSV','BAJFINANCE','BHARTIARTL','BPCL','BRITANNIA','CIPLA',
'COALINDIA','DRREDDY','EICHERMOT','GAIL','GRASIM','HCLTECH','HDFC','HDFCBANK','HEROMOTOCO','HINDALCO','HINDUNILVR',
'ICICIBANK','INDUSINDBK','INFRATEL','INFY','IOC','ITC','JSWSTEEL','KOTAKBANK','LT','M&M','MARUTI','NESTLEIND','NTPC',
'ONGC','POWERGRID','RELIANCE','SBIN','SHREECEM','SUNPHARMA','TATAMOTORS','TATASTEEL','TCS','TECHM','TITAN','ULTRACEMCO',
'UPL','VEDL','WIPRO','ZEEL']
c_name = input('Enter the Company name to perform the Analysis : ')
if c_name or c_name.swapcase() in c_list:
comp = c_name.upper()
else:
print('The Given name is not Present in Nifty50 Data')
comp_var = './Stock_Market_Nifty_CSV/'+comp+'.csv'
comp_var
comp_df = pd.read_csv(comp_var)
comp_df
print('The type of this dataframe is : ',type(comp_df))
print('Rows and Columns in the Dataframe :')
comp_df
print('Number of Rows and Columns in the : ')
comp_df.shape
print('Print the Dimention of our dataframe : ')
comp_df.ndim
comp_df.info()
#Descriptive Staistics
print('The tabular Description of the dataframe')
comp_df.describe()
print('The Correlation among all the rows and columns')
comp_df.corr()
print('Print top five rows dataframe : ')
comp_df.head(5)
print('Print bottom five rows of dataframe : ')
comp_df.tail(5)
Handling Null values
comp_df.isnull().sum()
numerics = ['int32', 'int64', 'float32', 'float64']
xp = comp_df.select_dtypes(include = numerics)
print('Total number of columns : ',len(comp_df.columns))
print('Numeric type columns : ',len(xp.columns))
print('Non-numeric/Object type columns : ',len(comp_df.columns) - len(xp.columns))
comp_df.fillna(comp_df.mean(), inplace=True)
comp_df.isnull().sum()
#comp_df[col_name].fillna('char', inplace=True)
#comp_df.isnull().sum()
a1 = comp_df.Date.max()
a2 = comp_df.Date.min()
a3 = comp_df.High.max()
a4 = comp_df.Low.min()
a5 = comp_df.Volume.mean()
a6 = comp_df.Turnover.max()
a7 = comp_df.Series.unique()[0]
a8 = comp_df.Last.max()
d1 = comp_df[comp_df['High'] == comp_df.High.max()]['Date']
d2 = comp_df[comp_df['Low'] == comp_df.Low.min()]['Date']
#### print('Whats the Stating date of '+comp+' :',a1)
print('\nWhats the Last date in the data of '+comp+' :',a2)
print('On this Date ',str(d1)[5:15])
print('\nHow much did the Highest recorded value of stock has gained of '+comp+' :',a3)
print('On this Date ',str(d2)[8:18])
print('\nHow much did the Least recorded value of stock has been of '+comp+' :',a4)
print('\nOf overall period, whats the average of Volume of '+comp+' :',a5)
print('\nThe total value of stocks traded during a specific period of time of '+comp+' :',a6)
print('\nThe Security type offered for '+comp+' company by Nifty50 is :',a7)
print('\nThe maximum profit of the New or recent Close of stock in '+comp+' :',a8)
This ends our preprocessing task, moving to the Actual data analysis and visualization part
If we want to explain EDA in simple terms, trying to understand the given data much better, so that we can make some sense out of it.
In statistics, exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.
Here we mainly concentrate on data visualization, in Python their are mainly 3 libraries are their Matplotlib.pyplot, Seaborn and one more is Plotly. Using these we wil perform the EDA. Navigation into data: starting to plot a simple series of closing price (EOM) for a single stock, all others analysis regarding group of data.
comp_df.columns
For doing Analysis, we will be selecting the specific columns out of the 15 columns and Generating the insights/plots.
Plotting price EOM (end of month) of single stock Company limited from the selected range of end of month 'Close' value.
plt.figure(figsize=(15,5))
an_1 = comp_df['Close'][1500:2600] #ranges from 0 to 5306
plt.title('Plot view of closure value of selected company')
plt.xlabel('Maximum EOM Close value')
plt.ylabel('Total number of Rows')
plt.plot(an_1)
comp_df['Date'] = pd.to_datetime(comp_df['Date'])
show_month = comp_df.set_index('Date')
show_month['Close'].loc['2012'].plot(figsize=(15,7), color='Red', linestyle='--')
plt.title("Stock Close Price")
plt.ylabel('Close Value')
plt.show()
comp_df['Open'][:2600].plot(legend=True,figsize=(13,6), grid=True)
plt.legend(bbox_to_anchor=(1.1, 0.75))
plt.ylabel('Close value range')
plt.xlabel('Year range from 2000 to 2014')
plt.title("Stock Close Price")
plt.autoscale()
plt.show()
fig = px.line(comp_df[5000:], x='Date', y='Volume', title='Volume of Stock on Perticular Date')
fig
fig = px.line(comp_df, x='Date', y='Last', title=' ')
fig.update_xaxes(
rangeslider_visible=True,
rangeselector=dict(
buttons=list([
dict(count=1, label="1m", step="month", stepmode="backward"),
dict(count=6, label="6m", step="month", stepmode="backward"),
dict(count=1, label="YTD", step="year", stepmode="todate"),
dict(count=1, label="1y", step="year", stepmode="backward"),
dict(step="all")
])
)
)
fig.update_layout(plot_bgcolor='rgb(250, 242, 242)',
title='NIFTY_50 : Major single day falls in '+comp+' from 2000 onwards',
yaxis_title='NIFTY 50 Stock',
shapes = [dict(x0='2020-03-23', x1='2020-03-23', y0=0, y1=1, xref='x', yref='paper', line_width=2,opacity=0.3,line_color='red',editable=False),
dict(x0='2019-09-3', x1='2019-09-3', y0=0, y1=1, xref='x', yref='paper',line_width=3,opacity=0.3,line_color='red'),
dict(x0='2020-02-1', x1='2020-02-1', y0=0, y1=1, xref='x', yref='paper',line_width=3,opacity=0.3,line_color='red'),
dict(x0='2020-03-12', x1='2020-03-12', y0=0, y1=1, xref='x', yref='paper',line_width=3,opacity=0.3,line_color='red')],
annotations=[dict(x='2020-03-23', y=0.5, xref='x', yref='paper',
showarrow=False, xanchor='left', text='Lockdown Phase-1 announced'),
dict(x='2019-09-3', y=0.05, xref='x', yref='paper',
showarrow=False, xanchor='left', text='Multiple PSU Bank Merger Announcements'),
dict(x='2020-02-1', y=0.5, xref='x', yref='paper',
showarrow=False, xanchor='right', text='Union Budget,coronavirus pandemic'),
dict(x='2020-03-12', y=0.3, xref='x', yref='paper',
showarrow=False, xanchor='right', text='Coronavirus declared Pandemic by WHO')]
)
fig.show()
fig = go.Figure()
fig.add_trace(go.Scatter(
x=comp_df['Date'],
y=comp_df['High'][500:1300],
name='High Price',
line=dict(color='blue'),
opacity=0.8))
fig.add_trace(go.Scatter(
x=comp_df['Date'],
y=comp_df['Low'][500:1300],
name='Low Price',
line=dict(color='orange'),
opacity=0.8))
fig.update_layout(title_text='Comparision of High vs Low of '+comp+' Industry of Nifty50',
plot_bgcolor='rgb(250, 242, 242)',yaxis_title='Low and High Values',xaxis_title='Years')
fig.show()
data= comp_df[['Volume','High']][1200:1400]
type(data)
# draw jointplot with
# hex kind
sns.jointplot(x = "Volume", y = "High",kind = "kde", data = data, color='Red')
# show the plot
plt.show()
data= comp_df[['Volume','Low']][1000:1400]
type(data)
# draw jointplot with
# hex kind
sns.jointplot(x = "Volume", y = "Low",kind = "scatter", data = data, color='Navy')
# show the plot
plt.show()
data= comp_df[['Volume','Turnover']][1300:1400]
type(data)
# draw jointplot with
# hex kind
sns.jointplot(x = "Volume", y = "Turnover",kind = "hex", data = data, color='Green')
# show the plot
plt.show()
# Comparing Google to itself shows a perfectly linear relationship
sns.jointplot(comp_df[1300:1400].High,comp_df[1300:1400].Close,kind='scatter',color='seagreen')
sns.pairplot(comp_df[1200:1300].loc[0:,['High','Low','Open','Close','Volume']])
plt.show()
import random
from itertools import count
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
plt.style.use('fivethirtyeight')
x_values = []
y_values = []
index = count()
def animate():
data = pd.read_csv('./Stock_Market_Nifty_CSV/ITC.csv')
x_values.append(data['Open'])
y_values = data['Close']
plt.cla()
plt.plot(x_values, y_values)
plt.xlabel('Time')
plt.ylabel('Price')
plt.title('ITC')
plt.gcf().autofmt_xdate()
plt.tight_layout()
ani = FuncAnimation(plt.gcf(), animate, 5000)
ani
plt.tight_layout()
plt.show()
ani
import matplotlib.pyplot as plt
import matplotlib.animation
import numpy as np
t = comp_df[2800:3000].High
x = comp_df[2800:3000].Volume
fig, ax = plt.subplots()
ax.axis([0,2*np.pi,-1,1])
l, = ax.plot([],[])
def animate(i):
l.set_data(t[:i], x[:i])
ani = matplotlib.animation.FuncAnimation(fig, animate, frames=len(t))
from IPython.display import HTML
HTML(ani.to_jshtml())
plt.rcParams["animation.html"] = "jshtml"
ani
comp_df[:500]["High"].iplot(kind="histogram", bins=170, theme="white", title="Total Volume of Stock",xTitle='For '+comp+' company as per CSV', yTitle='High value')
#we will check correlation between features in dataset.
my_title = 'Correlation Matrix for the Feature/Coloumns present in '+comp+ 'Using Horizontal bar type'
comp_df.corr().iplot(kind="bar", title=my_title)
comp_df.corr().iplot(kind='heatmap',
colorscale='Blues',
title="Feature Correlation Matrixpresent in "+comp)
df_genre = comp_df[100:2000:85].groupby('Date')
def genreBased(comp_df_feature):
xrange = np.arange(1,len(df_genre.sum())+1)
fig,ax= plt.subplots(ncols=2,figsize=(18,6))
df_to_plot = df_genre.sum().sort_values(by=comp_df_feature,ascending =False)[::-1]
df_to_plot[comp_df_feature].plot(kind='barh')
plt.title(comp_df_feature)
#labels
ax[1].set_ylabel(None)
ax[1].tick_params(axis='both', which='major', labelsize=15)
ax[1].set_xlabel('', fontsize=15,labelpad=21)
#spines
ax[1].spines['top'].set_visible(False)
ax[1].spines['right'].set_visible(False)
ax[1].grid(False)
#annotations
for x,y in zip(np.arange(len(df_genre.sum())+1),df_genre.sum().sort_values(by=comp_df_feature,ascending =False)[::-1][comp_df_feature]):
label = "{:}".format(y)
labelr = round(y,2)
plt.annotate(labelr, # this is the text
(y,x), # this is the point to label
textcoords="offset points",# how to position the text
xytext=(6,0), # distance from text to points (x,y)
ha='left',va="center")
#donut chart
theme = plt.get_cmap('Blues')
ax[0].set_prop_cycle("color", [theme(1. * i / len(df_to_plot))for i in range(len(df_to_plot))])
wedges, texts,_ = ax[0].pie(df_to_plot[comp_df_feature], wedgeprops=dict(width=0.45), startangle=-45,labels=df_to_plot.index,
autopct="%.1f%%",textprops={'fontsize': 13,})
plt.tight_layout()
genreBased('Open') #ABOVE
print()
genreBased('Close') #BELOW
sns.boxplot(x="Volume", y="Symbol", data=comp_df[1000:1500])
#Displot Method
df_name = ['High', 'Low']
j = 0
df_lst = [comp_df.High,
comp_df.Low]
for i in df_lst:
plt.figure(figsize=(16,5))
sns.distplot(i)
plt.title('Displot Method for '+str(df_name[j])+' Stock', fontdict={'fontsize':24})
j += 1
df = comp_df
fig = px.scatter_3d(df, x=comp_df[500:1000:13].Open,
y=comp_df[500:1000:13].Close,
z=comp_df[500:1000:13].High,
color=comp_df[500:1000:13].Date)
fig.
fig.show()
color_set = ['aggrnyl','agsunset','blackbody','bluered','blues','blugrn','bluyl','brwnyl','bugn','bupu','burg','burgyl','cividis','darkmint','electric','emrld','gnbu','greens','greys','hot','inferno','jet','magenta','magma','mint','orrd','oranges','oryel','peach','pinkyl','plasma','plotly3','pubu','pubugn','purd','purp','purples','purpor','rainbow','rdbu','rdpu','redor','reds','sunset','sunsetdark','teal','tealgrn','turbo','viridis','ylgn','ylgnbu','ylorbr','ylorrd','algae','amp','deep','dense','gray','haline','ice','matter','solar','speed','tempo','thermal','turbid','armyrose','brbg','earth','fall','geyser','prgn','piyg','picnic','portland','puor','rdgy','rdylbu','rdylgn','spectral','tealrose','temps','tropic','balance','curl','delta','oxy','edge','hsv','icefire','phase','twilight','mrybm','mygbm']
my_col = random.choice(color_set)
t = np.linspace(0, 4000, 120)
x, y, z = pd.Series(comp_df[:600].Volume), pd.Series(comp_df[:600].Open), t
fig = go.Figure(data=[go.Scatter3d(
x=x,
y=y,
z=z,
mode='markers',
marker=dict(
size=7,
color=z, # set color to an array/list of desired values
colorscale=my_col, # choose a colorscale
opacity=0.8
)
)])
# tight layout
fig.update_layout(margin=dict(l=0, r=0, b=0, t=0))
fig.show()
z_data = comp_df[:500]
z = z_data.values
sh_0, sh_1 = z.shape
x, y = np.linspace(0, 10, sh_0), np.linspace(0, 10, sh_1)
fig = go.Figure(data=[go.Surface(x=x, y=y, z=z)])
fig.update_layout(title='Mt Bruno Elevation', autosize=False,
width=650, height=650,
margin=dict(l=6, r=5, b=0, t=0))
fig.show()
Finally we have come to the end of EDA, and we have gathered few great Patterns on the Data on specific Company of Nifty50, But this cannot be the end, we can gather still more Patterns or insights.
Now moving further to the Next Step in our Project.
from sklearn.model_selection import train_test_split
from datetime import datetime, date
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from matplotlib import pyplot as plt
from sklearn.metrics import mean_squared_error
# start date should be within 5 years of current date according to iex API we have used
# The more data we have, the better results we get!
start = datetime(2016, 1, 1)
end = date.today()
df = comp_df[1:]
prices = df[df.columns[0:1]]
prices.reset_index(level=0, inplace=True)
prices["timestamp"] = pd.to_datetime(prices.Date).values.astype(float)
#prices['timestamp'] = ADBE['Date'].values.astype(float)
#ols1 = pd.ols(y=ADBE['Close'], x=ADBE['Date'], intercept=True)
prices = prices.drop(['Date'], axis=1)
prices
dataset = prices.values
X = dataset[:,1].reshape(-1,1)
Y = dataset[:,0:1]
validation_size = 0.15
seed = 7
X_train, X_validation, Y_train, Y_validation = train_test_split(X, Y, test_size=validation_size, random_state=seed)
# Test options and evaluation metric
num_folds = 10
seed = 7
scoring = "r2"
# Spot-Check Algorithms
models = []
models.append((' LR ', LinearRegression()))
models.append((' LASSO ', Lasso()))
models.append((' EN ', ElasticNet()))
models.append((' KNN ', KNeighborsRegressor()))
models.append((' CART ', DecisionTreeRegressor()))
models.append((' SVR ', SVR()))
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
# evaluate each model in turn
results = []
names = []
for name, model in models:
kfold = KFold(n_splits=num_folds, random_state=seed, shuffle=True)
cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
# print(cv_results)
results.append(cv_results)
names.append(name)
msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
print(msg)
# Future prediction, add dates here for which you want to predict
dates = ["2021-12-23", "2022-12-24", "2023-12-25", "2024-12-26", "2025-12-27",]
#convert to time stamp
for dt in dates:
datetime_object = datetime.strptime(dt, "%Y-%m-%d")
timestamp = datetime.timestamp(datetime_object)
# to array X
np.append(X, int(timestamp))
# Define model
model = DecisionTreeRegressor()
# Fit to model
model.fit(X_train, Y_train)
# predict
predictions = model.predict(Y)
print(mean_squared_error(Y, predictions))
# %matplotlib inline
fig= plt.figure(figsize=(12,5))
plt.plot(X,Y)
plt.show()
Standard questions for financial investors.
Here we will be using the Metadata stock csv and entire Nifty50 companies csv, to answer these questions, since using One company it is good to Plot, but coming to answer Queries, going with these large csv will be good to go.
# Ranking list over Industry
quest_1a = nse_india_df.groupby('Industry')[['Volume', 'Trades']].mean()
quest_1a.sort_values(by=['Volume'], ascending=False)
# Ranking list over Company name
quest_1b = nse_india_df.groupby('Company Name')[['Volume', 'Trades']].mean().head(5)
quest_1b.sort_values(by=['Volume'], ascending=False)
quest_2 = nse_india_df.groupby(['Company Name']).agg({'Trades': ['sum', 'mean']})
quest_2.columns = ['Trades_sum','Trades_mean']
quest_2.sort_values(by=['Trades_sum'], ascending=False).head(10)
quest_3_a = nse_india_df.groupby('Company Name')[['Company Name','Date', 'Close']].head(1)
quest_3_b = nse_india_df.groupby('Company Name')[['Company Name','Date', 'Close']].tail(1)
quest_3 = quest_3_a.merge(q_3_b, on="Company Name")
quest_3["Price_%_S-I"] = ((quest_3["Close_y"] - quest_3["Close_x"])/q_3["Close_x"])*100
# Top 5 Best Companies as per Price Percentage since its beginning
quest_3.sort_values(by=['Price_%_S-I'], ascending=False).head(5)
# Top 5 Worst Companies as per Price Percentage since its beginning
quest_3.sort_values(by=['Price_%_S-I'], ascending=True).head(5)
quest_4_a = nse_india_df.groupby(['Company Name'])['Close'].std() / nse_india_df.groupby(['Company Name'])['Close'].mean()
quest_4_a.to_frame().sort_values(by = 'Close', ascending=False)
# Now applying the same operation on All Unique Industries upon High gained Stock
quest_4_b = nse_india_df.groupby(['Industry'])['High'].std() / nse_india_df.groupby(['Industry'])['High'].mean()
quest_4_b.to_frame().sort_values(by = 'High', ascending=False)
#Nifty 50 Index/Industry Composition
quest_3 = stock_metadata_df.Industry.value_counts()
plt.title("Nifty 50 Index composition")
plt.pie(quest_3, labels=quest_3.index, autopct='%1.1f%%', startangle=180, radius=1.72);
It was great to analyze the data about stock market from NSE. Since Inception, most stocks have aims a great gain; it's strange to verify that the best performers are not in top 5 trading stocks (but, truly, it's in line with Warren Buffet's guideline).
Analysis could be more interested with Market CAP column, that is missing in this Dataset.
This analysis could be a starting point for people interesting on financial markets data. This dataset is still missing some core information (like capital markets for each stocks) but this information could maybe found using some others data providers (like Yahoo Finance) and maybe become a useful instruments for trading activity.
Links for tips: https://stackoverflow.com/ https://pandas.pydata.org/